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§0  Notation 

Throughout  this  thesis  the  following  notation  will  be  used  without  further 
comment  or  explanation. 

L2  - the  set  of  functions  on  [0,1]  which  are  Lebesgue  measurable  and 

square  integrable  equipped  with  the  usual  Banach  space  norm. 

W - for  m > 1.  The  set  of  functions  on  [0,1]  for  which  f^^^, 
m — 

j = 0,1,..., m-l  are  absolutely  continuous  and  f^"*^  is  in  L2.  This 

will  always  be  a Hilbert  space  equipped  with  the  inner  product 

<f,g>  =5  f^^^  g^^^dt 

j=0 

C - for  k ^ 0 or  infinite.  The  set  of  all  functions  on  [0,1]  which  are 
k times  continuously  differentiable. 

D - the  differentiation  operator. 

When  we  are  considering  functions  with  domains  other  than  [0,1]  the  relevant 
domain  will  be  shown  after  the  function  space  symbol  above,  e.g. 


CHAPTER  1 


Spline  Methods  in  Statistics 

§1.1  General  Introduction 

Statistics  (as  we  know  it)  began  with  fitting  parametric  models  to  data. 
Various  principles  for  making  estimates  and  inferences  were  developed  and 
refined  until  their  efficiencies  reached  their  (asymptotic)  limits  with 
methods  such  as  maximum  likelihood,  minimum  variance,  and  likelihood  ratio. 
Attention  returned  to  the  basic  models  and  there  appeared  a growing  realiza- 
tion that  not  all  of  the  parametric  structure  was  needed  to  make  inferences. 
From  this  idea  arose  the  techniques  variously  known  as  distribution  free  or 
non-parametric.  At  the  cost  of  some  loss  of  efficiency  in  certain  instances, 
these  methods  prevented  model  violations  from  being  reflected  in  false  infer- 
ences. Although  non-parametric  methods  have  had  some  remarkable  success  (for 
example,  the  theory  of  rank  tests)  they  all  too  often  ignore  useful  non-statis- 
tical  information  that  may  be  present,  and  as  a result  lose  efficiency. 

The  classical  method  of  incorporating  non-statistical  information  is  by 
means  of  the  Bayesian  framework.  In  the  absence  of  any  canonical  methods  of 
determining  and  assessing  priors,  this  has  to  be  regarded  with  suspicion. 
Happily,  there  are  at  least  three  ways  to  incorporate  validly  non-statistical 
knowledge  in  inference  procedures.  We  discuss  each  in  turn. 

The  first  occurs  when  it  is  realized  that  the  predominantly  normal  data 
contains  a certain  amount  of  contamination,  i.e.  the  normal  model  is  roughly 
correct.  This  knowledge  may  come  from  central  limit  type  considerations.  The 
robustness  methods  of  Huber  (1964)  and  Hampel  (1974)  exploit  this  knowledge. 

If  a system  is  known  to  have  an  order  structure,  this  knowledge  may  be 
exploited  by  methods  known  as  isotonic  inference.  The  book  by  Barlow,  et  al 
(1972)  reviews  most  of  these  techniques.  Knowledge  of  order  often  follows 
from  elementary  consideration  of  the  structure  of  the  system  being  modelled. 


The  third  sort  of  non-statistical  knowledge  we  can  use  is  knowledge  of 
smoothness.  The  least  action  principle  of  dynamics  suggests  that  nature  makes 
things  change  as  smoothly  as  possible.  Our  main  concern  in  this  thesis  will  bb 
with  a certain  measure  of  smoothness  and  the  consequences  of  making  the  under- 
lying function  as  smooth  as  possible.  A function  which  optimizes  a smoothness 
criterion  is  a spline.  Perhaps  a little  ironically  spline  methods  also  provide 
a sound  route  for  Bayesians  to  re-enter  the  scene,  as  Kimeldorf  and  Wahba 
(1970  and  1971)  show  there  often  exist  a Bayesian  model  which  gives  a smoothing 
spline  as  the  posterior  mean  in  that  model,  given  the  data.  Models  which 
require  an  order  structure  as  well  as  smoothness  lead  us  to  consider  isotonic 
splines.  Some  original  results  in  this  area  are  presented  in  Chapter  3. 

The  present  account  is  organized  as  follows;  The  remainder  of  Chapter  1 
is  devoted  to  those  purely  mathematical  parts  of  spline  theory  which  have 
proved  most  relevant  to  the  recent  applications  in  statistics.  Chapter  2 
reviews  the  literature  associated  with  the  main  applications  of  splines  to 
statistical  problems.  The  author's  results  on  the  existence  and  characteriza- 
tion of  isotonic  splines  are  given  in  Chapter  3. 

§1.2  Classical  Spline  Theory 

The  spline  is  the  engineer's  solution  to  a problem  frequently  concerning 
engineers.  The  problem  is  to  fit  a curve  through  points  i=l,2,...,n 

in  the  plane.  However,  the  engineer  often  needs  to  obtain  values  of  the  first 
and  second  derivatives  of  the  underlying  function  from  this  fitted  curve.  The 
spline,  an  optimal  solution  of  this  problem,  uses  the  analogy  with  weightless 
beams,  part  of  the  engineer's  stock  in  trade.  A precise  account  of  the 
engineer's  spline  follows. 
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Let  i = be  the  (error- free)  points  in  the  plane  for 

which  we  seek  an  interpolant.  Call  A = t^)  < ^2  Cj  <•  • • < Cjj  (=  t^)} 

a mesh.  For  computational  considerations  the  mesh  will  normally  just  be  the 
numbers  {t^:  i = l,...,n}. 

A (cubic)  spline  with  mesh  A,  written  S^(t),  is  a function  with  continuous 
derivatives  up  to  (and  including)  order  2 which  coincides  exactly  with  a 
(possibly  different)  cubic  function  on  each  interval  i = 1,...,N-1. 

The  points  (5^:  i = 2,...,N-l}  are  called  the  knote  of  the  spline.  A spline 
S^(t)  which  in  addition  satisfies  S^(t^)  = y^^,  i = l,2,...,n  is  an  interpolant 
for  the  data.  Some  further  restriction  is  needed  in  order  to  make  this  inter- 
polant unique  on  Although  for  certain  applications,  other  end  condi- 
tions are  more  convenient,  the  most  natural  condition  is  ~ 

This  corresponds  to  giving  the  analogous  beam  cantilevered  ends  (protruding 
beyond  the  end  points)  and  also  minimizes  the  "energy  of  flexion"  of  the  beam 
(i.e.  the  mean  square  curvature).  The  proofs  of  these  various  properties  are 
established  in  a much  general  context  by  various  contributors  to  the  theory  of 

L-splines.  See  Ahlberg,  Nilson  and  Walsh  (1967).  Notice  that  if  the  spline 

3 2 

between  and  C^i+2’^i+l^  equation  y = a^t  + b^t  + c^t  + d,,  the 

continuity  of  the  lower  derivatives  ensures  that  b^,  c.,  d.  are  constants, 
independent  of  i. 

Since  curve  fitting  is  a very  practical  task,  it  is  against  numerical 
considerations  that  a curve  fitting  method  must  be  judged.  The  ease  with  which 
a cubic  spline  is  fitted  must  have  contributed  to  its  popularity. 

We  now  sketch  the  main  steps  in  fitting  a cubic  spline  to  data.  Suppose 
data  is  {(t^,y^),  i = l,2,...,n}.  Let  h^  » t^^^  - t.  and  take  = value  of 
second  derivative  of  interpolating  spline  S^  at  t^^,  i = l,2,...,n. 
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Suppose  the  polynomial  interpolating  and  ^'^i  + l'^i+p 


1.2.1 


y = aj(t-t^)^  + b.(t-t^)^  + c^(t-t^)  + dj 


then 


1.2.2 


b.  = M./2 
1 1 


»i  • '"i.i  - 


c.  = 
1 


y.  , - y.  2(h.M.  + h.M.  ,) 

•'i+l  ■'i  11  1 1+1 


h. 

1 


d.  = y. 

1 •'i 


Thus  our  curve  fitting  problem  reduces  to  that  of  finding  the  values  of  M^. 

The  equations  relating  the  are  obtained  by  using  the  continuity  of  the  first 
derivative  of  the  spline,  along  with  the  relations  1.2.2  to  give 


-^i+l"^i  ^i  ^i-1-, 


•'i.l  «1.1  ' * "l"!.!  ■ FT h-j 


for  i = 2,3, .. . ,n-l . 


Our  demand  that  M = M =0  leads  immediately  to  a tridiagonal  system  of  linear 
1 n 

equations  for  M2,...,M^  j.  This  system  can  be  easily  solved  by  Gaussian  elimi- 
nation. So  easily  in  fact,  that  fitting  an  interpolating  spline  to  40  points 
is  feasible  with  only  a simple  calculator. 

it  was  soon  realized  that  the  cubic  spline  was  the  solution  of  an  optimi- 
zing problem  which  could  be  easily  generalized.  Let  L be  a differential 
operator  with  constant  coefficients  of  order  m (usually  D*")  and  let 
{(ti,yi):  i = l,...,n}  be  data  points.  The  problem 
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00  2 

minimize  (Lf)  dt 

subject  to  e L2 j = 0,1,..., m 

and  f(t^)  = y.  i = 1,. . . ,h 

has  a solution  f(t)  which  satisfies  L*Lf(t)  = 0 in  the  intervals  between  knot 
points,  where  L*  is  the  adjoint  operator  to  L.  We  call  such  a solution  an 
"Interpolating  L-Spline" . There  are  accounts  of  such  splines  in  the  books  by 
Ahlberg,  Nilson  and  Walsh  (1967)  and  T.  N.  E.  Greville  (1969).  However,  for 
statistical  purposes  another  type  of  spline  turns  out  to  be  more  useful. 

§1.3  Smoothing  Spline  Theory 

For  most  applications  in  statistics  the  smoothing  spline  is  much  more 
useful  than  the  interpolating  spline.  This  is  because  most  real-life  data  is 
subject  to  error  be  it  from  sampling,  measurement  or  other  sources.  There  are 
two  main  spline  fitting  methods  in  common  use  corresponding  to  different  ways 
of  dealing  with  the  "noise"  in  the  data.  Because  this  data  does  not  constrain 
the  fitted  function  nearly  as  firmly  as  in  the  interpolating  spline  case,  the  • 

fitting  requires  a genuine  optimization  routine,  not  just  the  simple  solution  i 

of  a linear  system  of  eqiiations  as  with  the  cubic  interpolating  spline. 

The  first,  more  frequently  used  method,  parallels  the  least  squares  curve 
fitting  procedure  by  minimizing  a criterion  depending  on  squares  of  deviations 
from  data  points  and  on  the  "roughness"  of  the  fitted  curve.  When  we  have 
little  or  no  knowledge  of  the  magnitude  of  possible  errors  in  our  data  this 
method  is  the  appropriate  one  to  use. 

On  the  other  hand  when  the  data  points  are,  for  example,  direct  readings 


from  a calibrated  instrument,  we  may  be  able  to  set  fairly  narrow  100%  confi- 
dence limits  for  each  data  point.  The  second  method  is  used  in  these 
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circumstances.  For  this  we  need  to  replace  the  ordinate  y in  the  two  dimen- 
sional data  with  a 100%  confidence  interval,  and  constrain  the  fitted  spline 
function  to  pass  through  all  of  these  intervals.  This  is  accomplished  in 
practice  by  using  an  optimization  routine  to  minimize  the  (convex)  roughness 
criterion,  subject  to  the  linear  constraints.  It  will  be  noticed  that 
method  attaches  considerable  importance  to  outliers,  rather  than  largely 
ignoring  them. 


First  Method  of  Fittiruj  Smoothing  Splines 
Suppose  the  t values  of  the  data  lie  in  a finite  interval  say  [0,1]  and  we 
have  0 < tj^  < t2  < . . . < t^  < 1.  Fitting  the  spline  leads  us  to  solving  the 
following  problem. 


1.3.1 


Minimize  \ (f(t.)-y.)^  + X dt 

i=l  ^ ^ 

Subject  to  f e W^,  X fixed  > 0 . 


The  solution  is  given  explicitly  in  the  paper  of  Kimeldorf  and  Wahba 
(1970)  and  as  expected  turns  out  to  be  a polynomial  spline  of  degree  2m-l  with 
possible  knots  at  the  data  points.  As  so  often  happens,  this  theoretical 
solution  cannot  be  used  as  an  algorithm  in  any  realistic  practical  case.  When 
the  t values  are  evenly  spaced  throughout  [0,1],  and  f is  periodic,  Cogburn 
and  Davis  (1974)  show  how  to  do  the  fitting  more  easily.  Apart  from  this  one 
happy  instance,  a heavy  optimization  is  invariably  required. 

Notice  that  the  number  X > 0 in  1.3.1  controls  the  amount  of  smoothing:  X 
too  small  results  in  overfitting  and  insufficient  removal  of  noise,  whereas  X 
too  large  results  in  underfitting  and  removal  of  much  of  the  wanted  signal 
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with  the  noise.  Clearly  the  correct  choice  of  X is  of  the  greatest  importance. 

A satisfactory  solution  to  this  problem  is  given  by  Wahba  and  Wold  (1975), 
although  not  all  theoretical  consequences  are  yet  developed. 

Second  Method  of  Fittincj  Smoothing  Splines 

From  what  was  written  in  the  preamble  of  this  section,  the  reader  will  have 
seen  that  this  spline  is  a cross  between  the  interpolating  spline  and  our  first 
smoothing  spline.  For  this  reason  the  fitting  technique  is  also  known  as  (the 
solution  procedure  for)  the  Generalized  Hermite-Birkhoff  Interpolation  Problem- 
(GHB  problem). 

Let  [a^,3^]  be  the  100%  confidence  interval  for  the  ordinate  at  t^  (with 
^i  ^i^  ’ problem  is 

Minimize  (f^"*^)^  dt 

1.3.2  subject  to  the  constraints  f e W , a.  < f(t.)  < S. 

for  i * 1,2, ... ,n. 

Various  recent  contributions  to  the  theory  of  such  splines  have  been  made 
by  M.  Atteia  (1968),  P.  J.  Laurent  (1969),  and  K.  Ritter  (1969).  Because 
Hilbert  space  methods  are  directly  applicable,  this  spline  has  been  more 
thoroughly  theoretically  analyzed  than  the  first  smoothing  spline. 

For  our  results  on  isotonic  splines  we  are  able  to  adapt  the  methods  of 
Atteia  and  Laurent  to  the  isotonic  situation,  so  our  isotonic  splines  are  of 
this  second  kind.  For  the  record,  the  solution  of  1.3.2  is  a spline  of  degree 
2m- 1 with  knots  at  those  data  points  where  the  constraints  are  active. 
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§1.4  Some  Simplifications 

For  most  practical  applications  a slightly  sub-optimum  solution  may  be 
preferable  to  the  optimum  if  it  involves  a great  deal  less  computation  effort. 
We  have  already  seen  that  not  all  of  the  data  points  are  knot  points  for  the 
smoothing  spline.  By  using  the  following  guidelines  the  old  statistical  virtue 
of  eyeballing  the  data  can  be  converted  to  considerable  computational  advantage. 
Knot  Point  Selection  (Cubic  smoothing  spline) . 

1)  Knot  points  should  be  at  data  points. 

2)  Try  to  have  at  least  4 or  5 data  points  between  knots. 

3)  Have  not  more  than  one  extremum  and  one  inflexion  point  between  knots. 

4)  Have  extrema  centered  in  intervals  and  inflexion  point  near  knots. 

For  more  details  see  Wold  (1974). 

The  form  of  the  optimal  epline  function  f. 

Suppose  data  points  {t.  , t.  , ...,  t.  } have  been  chosen  as  knots.  Then  the 

^1  ^2 

optimal  spline  function  will  be  f such  that  f(t)  = a + at  + a_t"^  + ^ d.* 

0 i 2 j=l  ^ 

where  (t  - c)5  | = J ^ 


(t  - t.  )■' 
1.  + 


c 

c 


§1.5  Bayesian  Estimation  Again 

We  are  now  in  a position  to  make  our  remarks  about  the  equivalence  of 
smoothness  and  Bayesian  posterior  means  precise. 

Let  L = Ij_Q  be  a differential  operator  and  let  B = [b^j^]  be  a 

positive  definite  matrix.  Suppose  B ^ = [b^*^!. 


Problem  I 


Find  f £ W (-<»,«>)  which  minimizes 
m 

I (f(t  ) - yjbj’'(f(t  ) - y^)  ^ (Lf)2  dt 

j.k  •' 


I 
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Problem  II 


Find  f(t)  with  f(t)  = E(x(t) |y(t^) , y(t2) , . . . ,y(t^)) 
where  y^  = x(t^)  + e^  with  e^  "v  N(0,B)  and  x(t)  is  a 
^ stationary  Gaussian  process  with  mean  zero  and  spectral 

density  f(X)  = ^ ■fp^XTp  ^j=0 


In  their  paper  of  1970,  Kimeldorf  and  Wahba  show  that  the  solution  f of  Problems 
I and  II  is  the  same  function. 


§1.6  Some  General  Remarks 

(a)  A little  reflection  will  convince  the  reader  that  smoothing  spline  methods 
will  be  the  most  useful  when 

(i)  An  appropriate  parametric  model  is  not  known  and 

(ii)  High  accuracy  is  needed  and 

(iii)  A considerable  amount  of  (noisy)  data  is  available. 

When  these  conditions  are  satisfied,  spline  methods  will  give  very  good 
value  for  the  computational  effort  invested. 

(b)  The  (smoothing)  spline  is  very  much  the  child  of  its  age  - the  1960's  - 
depending  as  it  does  on  optimization  theory  and  the  medium/ large  computer 
for  its  implementation. 

(c)  Hilbert  space  methods  associated  with  L2  spaces  have  been  used  throughout 
this  thesis,  in  preferences  to  more  general  ones  in  other  spaces 
(where  these  exist).  In  the  absence  of  any  evidence  that  Lp  is  more 


appropriate  than  L2  for  the  particular  application,  we  believe  this  to  be 
justifiable.  The  fact  that  12(0,1]  can  always  be  embedded  in  Lp  [0,1] 
for  1 £ P £ 2 further  bolsters  this  position. 


CHAPTER  2 

Statistical  Spline  Methods  Literature 
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§2.1  Preamble 

In  the  introduction  we  tried  to  place  spline  methods  in  the  statistical 
scheme  of  things.  Splines  will  be  seen  to  be  a departure  from  the  most 
general  non-parametric  model  back  a little  towards  the  (structure  rich)  para- 
metric situation.  The  reward  for  the  changed  position  is  improved  efficiency 
coupled  still  with  non-parametric  integrity. 

Although  spline  functions  were  available  in  a highly  refined  form  by  the 
mid  1960's  they  were  for  some  years  largely  ignored  by  statisticians.  This 
situation  was  dramatically  changed  by  the  appearance  of  the  paper  by  Kimeldorf 
and  Wahba  (1970).  Although  the  author's  intention  was  to  show  the  equivalence 
of  smoothing  by  splines  withthe  finding  of  a posterior  mean,  the  real  effect 
was  to  convince  statisticians  that  splines  were  effective  and  not  as  difficult 
as  they  had  thought. 

When  all  is  said  and  done,  spline  methods  are  just  a way  of  fitting  a 
smooth  curve  to  some  data.  The  curve  estimates  most  studied  in  the  statistical 
spline  literature  are  for  non-parametric  density  estimation  from  an  independent, 
identically  distributed  sample  and  for  the  estimation  of  the  spectral  density 
of  a stationary  time  series.  Splines  have  also  been  used  in  other  areas,  but 
are  at  present  more  a curiosity  than  a serious  practical  tool. 

§2.2  Non- Parametric  Density  Estimation 

Our  account  will  be  confined  exclusively  to  density  estimation  based  on  an 
independent  identically  distributed  sample.  There  are  two  main  routes  we  may 
follow:  we  may  use  the  empirical  distribution  function  in  some  way  or  we  may 
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use  an  appropriate  analogue  of  maximum  likelihood  adapted  to  the  infinite 
dimensional  (non-parametric)  situation. 

The  empirical  density  is  very  easily  obtained  from  the  empirical  distribu- 
tion when  we  use  the  Sobolev  spaces  because  of  the  following: 


Lemma:  The  solution  f of  the  problem 


Minimize  (f^'’'^)^dt  with  f e 
and  f(t^)  = y^,  i = l,...,n 


and  the  solution  g of  the  problem 


(B) 


Minimize  (g^'""^^)^dt  with  g € 
and  (D‘^g)(t^)  = y^,  i = l,...,n 


are  related  by  Df  = g.  This  means  the  enqjirical  spline  fitted  density  is 
obtained  by  differentiating  the  spline  fitted  distribution  function  . 

The  histosplines  described  by  Boneva,  Kendall,  and  Stefanov  (1971)  are 
empirical  densities,  in  the  nature  of  a smooth  analogue  of  a histogram,  with 
pleasant  mathematical  features.  To  make  their  analysis  feasible,  the  authors 
are  prepared  to  allow  densities  which  sometimes  take  small  negative  values  in 
a small  region. 

Let  Wj (-«,<»)  denote  the  set  of  functions  on  the  real  line  which  are,  along 
with  their  first  derivatives,  in  L2 The  inner  product  is 
<u,v>  = (uv  + u'v')dt.  Also  let  I2  denote  the  set  of  square  summable 
(double  ended)  real  sequences  with  inner  product  (0u,9v).  Define  0:  -►  by 

(0u)j  = u(t)dt.  Now  define  a new  inner  product  tu,v]  on  (-<»,<»>)  by 
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[u,v]  = (6u,9v)«  + /”  u'v'  which  generates  the  same  topology  as  the  old 

one.  Write  Z = {u  e W^ : 0u  = 0}  and  S = {s  e W^:  [s,u]  = 0 for  all  u e Z}. 

Each  o € has  unique  decomposition  a = s + z,  s e S,  z e Z.  Thus  we  obtain 

the  projection  P:  W^  S where  Pa  = s. 

Then  the  authors  show 

1)  0 is  a 1-1  bicontinuous  map  S t2 

2)  S consists  of  all  s € s'z'  =0  for  all  z e Z 

3)  For  given  h e ^2*  unique  solution  of  0a  = h which 

JOO  2 

minimizes  J (a') 

• 00 

4)  S consists  of  those  functions  continuous  and  continuously  differenti- 
able such  that 

i)  s(t)  is  quadratic  in  each  cell 
ii)  / (s^  + s '^)< 

The  delta-spline  is  that  function  e S which  has  (0Sq)q=  1,  = 0, 

i 0.  This  function  is  tabulated  explicitly  in  the  paper.  All  of  the 
maneuvering  with  Z and  [ ] is  rewarded  with  the  following  result. 

Proposition:  Take  h c ^2*  integer  j,  let  s^  be  the  translated 

delta-spline  with  (0s.). = 1,  (0s.)r  = 0,  i / j.  Then  the  unique  histospline 

3 J j ^ j 

a € Wj  which  has  0a  = h and  which  minimizes  / (a')^dt  is  given  by  a = . j 

The  paper  of  Boneva,  Kendall  and  Stefanov  (1971)  also  describes  another  histo-  [ 

i 

spline  and  includes  much  empirical  material  on  histospline  behavior. 

Some  Remarke  on  Histosplinee 

1.  Once  the  tabulated  form  of  the  delta-spline  is  stored  in  the  computer 
(requiring  39  parameters)  there  is  no  explicit  optimization  required  - just  the 
grouping  of  the  data  into  classes.  Consequently  this  method  is  well  suited  to 


progranmable  calculators  and  mini-computers  without  optimization  routines. 

2.  Boneva,  Kendall  and  Stefanov  (1971)  and  Schoenberg  (1972a  and  b)  also 
consider  the  variant  histospline  defined  as  the  derivative  of  that  function  G 
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in  W2[0,l]  which  solves: 

Minimize  /J:  (G")^dt 

” i-1 

2.2.1  Subject  to  G(0)  = 0 and  G(ih)  h.  i = l,2,...,i, 

i=0  ^ 

where  (^+l)h  = 1 and  G'(0)  = G'(l)  = 0 

Yet  another  variant  of  this  problem  is  considered  by  Wahba  (1975b) . This 
involves  replacing  the  final  constraints  in  2.2.1  by 

G' (0)  = and  G' (1)  = 8^  where  a^,  8^ 

are  calculated  from  the  empirical  distribution  function.  This  variant  gives 
better  accuracy  near  0 and  1 than  2.2.1,  When  the  criterion  is  minimum  mean 
square  error  at  a point,  Wahba  (1975b)  also  shows  how  to  choose  h optimally. 

Finally  we  note  that  if  the  problem  2.2,1  has  the  further  constraint 
G'(t)  ^ 0 for  all  t e [0,1]  the  solution  will  be  isotonic  with  respect  to  a 
natural  order  on  W2  and  will  give  a more  acceptable  density  function. 

3.  It  must  be  emphasized  that  histosplines  are  interpolating  splines  based  on 
the  sample  histogram,  and  not  a smoothing  spline.  Consequently  in  the  presence 
of  noise  (sampling  error)  we  cannot  expect  this  method  to  be  much  better  at 
filtering  the  noise  than  the  histogram  it  is  derived  from. 

This  assertion  is  supported  by  the  results  of  Wahba  (1975b)  who  shows  for 
her  variant  of  the  histospline  that  for  the  true  density  f « and  f^  the 
histospline  corresponding  to  a sample  of  size  n 
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E(£jj(t)  - = 0 

In  a companion  paper  Wahba  (1975a)  shows  that  the  expected  mean  square  error  at 
a point  t has  that  same  order  of  magnitude  for  all  of  the  following  estimation 
methods:  the  polynomial  algorithm  (Wahba),  kernel  type  estimator  (Parzen) , 
certain  orthogonal  series  estimates  (Kronmal- Tatar) , and  the  ordinary  histogram 
However,  the  constants  covered  by  the  0 may  be  larger  in  these  latter  cases. 


r 


1 


§2.3  Densities  by  Maximum  Penalized  Likelihood 

This  area  is  relatively  unexplored  to  date.  The  analogy  with  parametric 
maximum  likelihood  estimation  gives  rise  to  the  hope  that  Maximum  Penalized 
Likelihood  Estimators  (MPLEs)  may  be  optimal  in  some  fundamental  sense.  We  now 
look  at  some  of  the  details. 

Let  12  be  an  interval  (a,b)  and  let  H(12)  be  a manifold  in  Lj(fi).  Suppose 
(tj^,t2,  • . . .t^)  is  a i.i.d.  sample  from  an  unknown  density  f e Lj(fi). 
Unfortunately  the  problem 

n 

Maximize  L(v)  = ] f v(t.)  subject  to  v e H(12), 

2.3.1  i=l  ^ 

v(t)dt  = 1,  v(t)  ^0  Vt  e 12 

will  not  have  a solution  for  most  manifolds  of  interest  (certain  isotone  mani- 
folds for  example  of  unimodal  or  monotone  functions  are  an  exception).  Specifi- 
cally, any  manifold  which  contains  an  approximating  sequence  to  any  linear 
combination  of  6-functions,  admits  no  maximum  likelihood  estimator  for  the 
density  f. 

From  heuristic  Bayesian  considerations.  Good  and  Gaskins  (1971)  suggested 
adding  a penalty  term  to  the  likelihood  which  would  penalize  unsmooth  estimates. 
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They  chose  a manifold  and  penalty  function  that  lead  inevitably  to  polynomial 
splines.  Good's  and  Gaskin's  results  were  refined  and  made  rigorous  by 
de  Montricher,  Tapia  and  Thompson  (1975).  We  can  now  describe  the  current 
state  of  the  art. 

It  will  normally  be  the  case  that  the  manifold  H{f2)  is  contained  in 

W (£1)  and  the  penalty  function  tCv)  = l|v||„  . Let  L(v)  = I rv{t.)exp(-<I'(v)) 

^ m i=l 

and  consider  the  optimization  problem 

Maximize  l^(v)  subject  to 
(i)  V c H(fi) 

(ii)  /jj  V dt  = 1 
(iii)  v{t)  > 0,  Vt  e £2 

The  solution  v is  the  MPLE  of  the  underlying  density,  f. 

The  task  of  computing  the  MPL  Estimate  of  the  density  is  greatly  simpli- 
fied by  knowing  the  form  the  optimum  must  take.  The  following  existence 
theorem  is  proved  in  the  paper  by  de  Montricher,  Tapia  and  Thompson  (1975)  . 

Theorem:  For  m ^ 1,  the  MPLE  corresponding  to  W^  exists,  is  unique,  and  is  a 
polynomial  spline  of  degree  2m-l.  Moreover,  if  the  estimate  is  positive  in 
the  interior  of  an  interval,  then  in  this  interval  it  is  of  degree  2m- 1 and  of 
continuity  class  2m-2  with  knots  at  the  sample  points. 

From  (Fisher)  information-theoretic  considerations,  as  well  as  a desire  to 


t - 


2.3.2 


avoid  the  awkward  non-negativity  constraint  v(t)  ^ 0,  Good  and  Gaskins  (1971) 
also  considered  the  MPLE  problem  with  manifold 
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Hj(0)  = {v:  V**  € Wj  and 

' 2 

<f^{v)  = a /”^  dt,  a > 0. 


Hj consists  of  those  non-negative  functions  in  whose  square  roots  have  deriv- 
ative in  L^f  and  the  squared  norm  of  the  derivative  of  the  square  root  is  the 
roughness  penalty. 

After  noting  that  the  reformulation  trick  2.3.3  is  standard  in  the  liter-  . 

ature,  deMontricher,  Tapia  and  Thompson  (1975)  record  conditions  for  its  valid  j 

use  with  the  following  lemma.  I 


Lemma:  Let  H be  a subset  of  L2(n)  and  J a functional  on  H.  Consider 


Problem  I 


u 

Maximize  J{v  ) subject  to 
v"*  e H,  /v  dt  = 1,  v(t)  > 0 Vt 


Maximize  J(u)  subject  to  u e H 

and  Problem  II 

and  / u^dt  = I 

Let  u*  be  a solution  of  II.  Then  v*  = (u*)^  solves  I if  and  only  if  ju*)  e H and 
J(u*)  = J(|u*|). 

The  authors  (de  Montricher,  Tapia  and  Thompson)  go  on  to  establish  that  the 
price  of  using  the  non-negativity  trick  is  to  lose  the  polynomial  spline  form  of 
solution  - the  solution  is  an  exponential  spline  instead,  with  knots  at  the  sample 
points. 

We  regretfully  record  that  on  the  two  most  important  aspects  - how  effec- 
tively noise  is  filtered  out,  and  the  asymptotic  (large  n)  performance  - the 

9 

literature  on  MPLEs  is  quite  silent. 
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§2.4  Noise  Filtering  by  Smoothing  Splines 

Statisticians  and  applied  mathematicians  are  continually  faced  with  the 
problem  of  recovering  a smooth  function  when  only  noisy  measurements  of  it  are 
available.  In  fitting  a parametric  model  the  residuals  are  made  up  of  the  noise 
as  well  as  the  deviations  of  the  model  from  the  true  function.  Smoothing 
splines  are  admirably  placed  to  estimate  this  true  function  (known  only  to  be 
smooth)  for  two  reasons.  First,  they  are  flexible  enough  to  respond  to  any 
real  local  variation,  without  allowing  pathological  behavior,  and  second,  the 
actual  degree  of  smoothing  (=  filtering  of  noise  together  with  rapid  variation) 
is  controllable.  Even  when  the  correct  degree  of  smoothing  is  unknown,  these 
features,  in  conjunction  with  a technique  called  cross-validation  (to  determine 
the  correct  degree  of  smoothing)  allow  us  to  remove  most  of  the  model  deviation 
component  from  the  residuals,  leaving  virtually  only  the  (real)  noise.  We 
presently  give  an  account  of  the  main  features  of  fitting  smoothing  spline 
functions  by  cross  validation  - full  details  are  given  by  Wahba  and  Wold  (1975a 
and  b) . 
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Consider  the  problem:  Find  f c W to 

m 

Minimize  d i:(Y(t.)  - f(t.))^*A/J  (f^"'^)^dt) 

2.4.2  ” J J u 

where  X > 0 is  a fixed  real  number. 

The  first  term  is  a measure  of  the  fidelity  to  the  data,  the  second  is  X times 
the  "smoothness"  of  f.  The  optimum  solution  is  known  (Greville  (1969), 

Reinsch  (1967))  to  be  a cubic  spline  with  knots  at  the  t ^ , i = 1,2,. ..,n.  As 
X the  solution  f , approaches  its  smoothest  possible  form  - the  least 

n > A 

squares  straight  line  through  the  data.  As  X -»•  0,  f , approaches  the  inter- 

n>  A 

polating  spline  through  all  of  the  data  points.  Thus  we  call  X the  degree  of 

smoothing.  It  is  shown  in  Wahba  (1973  and  1974)  that  in  order  to  have 

W 

m 

f ,-^f  asn-^-^we  must  also  have  X -*■  0. 
n y A 

If  ( from  previous  experience  of  our  particular  problem)  the  correct  value 
of  X is  known,  we  have  only  to  solve  2.4.2  using  that  X.  Unless  the  problem 
2.4.1  can  be  converted  to  the  periodic  smoothing  spline  form  of  Cogbum  and 
Davis  (1974)  there  is  no  simple  way  of  solving  2.4.2  other  than  the  usual 
optimization  routine. 

When  X is  not  known  we  can  (with  much  labor)  use  the  Cross  Validation 
Mean  Square  Error  (minimizing)  technique  to  estimate  the  appropriate  degree  of 
smoothing  from  the  data  alone.  The  method  has  been  used  successfully  in  various 
applications  by  Feinberg  and  Holland  (1972),  Hocking  (1972),  Mosteller  and 
Wallace  (1963)  and  others.  In  effect  the  CVMSE  method  gives  the  value  of 
parameter  which  maximizes  the  internal  consistency  of  the  data  set  with  regard 
to  the  applied  model. 

Wahba  and  Wold  find  it  useful  to  recast  problem  2.4.2  into  a form  used  by 


Reinsch  (1967). 
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2.4.3 


1 2 

Find  f e ^2  to  minimize  (f ')  dt  subject  to 
1 " ? 

— I (Y(t.)  - f(t.))  < S,  where  S is  given. 

" j=l  J ^ 


It  is  well  known  that  if 


n S £ inf  ^ (Y(t.)  - a + b t.)' 
a,b  ^ ^ 


then  there  exists  a unique  X = X(S)  such  that  f , is  the  solution  to  2.4.2  and 

ri  y A 


- I (Y(t.)  - f , (t.))^  = S. 
n j-*  n,X  ^ j 


Armed  with  the  appropriate  tools  from  the  last  paragraph,  we  now  give  an 
account  of  Wahba  and  Wold's  Cross  Validation  procedure  (1975a). 

1)  Divide  the  data  set  into  p groups 

Group  1:  t^.  t^^p.  t^^2p*  ••• 

Group  2:  t2.  t2^p.  t2^2p-  ••• 

Group  p:  tp.  t2p,  tjp.  ... 

2 

2)  Guess  a starting  value  for  S (Almost  invariably  S = ko  with 
0.7  £ k £ 1.  A reasonable  starting  point  might  be  k = 0.8). 

3)  Delete  the  first  group  of  data.  Fit  a smoothing  spline  to  the  rest  of 
the  data  using  the  method  of  Reinsch  with  the  S of  step  2.  Compute  the  sum  of 
squared  deviations  of  this  smoothing  spline  from  the  deleted  data  points. 

4)  Delete  instead  the  second  group  of  data.  Fit  a smoothing  spline  with 
the  S of  step  2.  Compute  the  sum  of  squared  deviations  of  the  spline  from  the 
data  points. 
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n 

f 

r 

f 

5)  Repeat  Step  4 for  the  3rd,  4th,  . . . , p^^  group  of  data. 

6)  Add  the  sums  of  squared  deviation  from  steps  3-S,  and  divide  by  n. 

This  is  the  CVMSE  for  S and  is  written  CV(S). 

7)  Determine  the  S = making  CV(S)  a minimum. 

The  smoothing  problem  2.4.3  can  now  be  solved  with  S = S^. 

2 

Empirical  studies  by  Wahba  and  Wold  (1975a)  indicate  that  when  o is 

2 

extremely  small,  the  CVMSE  estimate  for  k in  S = ko  has  positive  bias, 

f 

resulting  in  very  slight  undersmoothing.  This  effect  is  negligible  for  realis- 

2 ' 
tic  sized  a , although  the  authors  do  not  present  a proof.  ] 

i 

] 

§2.5  Periodic  Smoothing  Splines 

The  work  of  Cogburn  and  Davis  (1974)  has  been  referred  to  several  times 
already.  We  now  describe  their  results  in  detail. 

Let  G be  the  group  of  real  numbers  modulo  2n  with  the  usual  topology  and 

! ■. 

I measure.  The  model  to  be  fitted  is 

I 
f 

; h(t)  = f(t)  + e(t)  Vt  e G 

2.5,1  with  f e W (G)  and  Ee(t)  = 0 , Vt  c G 

• m 

; 2 

and  Ee(s)e(t)  = a s = t 

' =0  s ^ t 

where  h is  observed  either  on  a lattice  of  points  or  continuously  and  the  noise 
2 

variance  a is  unknown.  The  asymptotic  solution  for  large  n devised  by  Cogburn 
• and  Davis  is  very  convenient  to  handle,  and  easy  to  compute  since  it  avoids 

explicit  optimization. 

First  we  need  some  definitions.  Let  L2(G)  be  the  set  of  all  square  inte- 
grable  functions  on  G.  Let  W^(G)  = {g  e L2(G):  g^^^  is  absolutely  continuous, 
j = 1,2,...,  m-1  and  g^"*^  e k,(G)}.  Let  L be  a linear  differential  operator 

( 

' i 

I j 
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of  order  m 


L = o'"  + ^ with  Yj  constant. 


1r  ^ 

In  approximating  a function  h e L2(G)  by  g c we  need  the  following 

1 r2n  , 1 /•2n^,  .,2^^ 

measure  of  closeness.  ^ L ^ n ^k=l  V ^ * ~2vr  ^0 

* * X IT 


The  function  g e W^(G)  minimizing  A(g,h)  will  be  called  the  Periodic  Lattice 


Smoothing  Spline  (LSS)  to  h.  When  h is  known  for  all  t e G the  smoothing 


spline  to  h is  that  function  g £ minimizing 


^.L  ‘8.h)  = /g  '*">  - "<'»"<**  * ^ '>■«>'  'I'- 


Such  a g £ W^(G)  is  called  the  Periodic  Continuous  Smoothing  Spline  (CSS)  to  h. 


Cogburn  and  Davis  discuss  algorithms  for  fitting  the  LSS  and  CSS.  We 
summarize  their  final  form. 


Let  P(u)  be  the  characteristic  polynomial  of  L so  that 


(u)  = u"*  + Y,u™”^  + ...  + Y and  let  Q(k)  = |P(ik)|'^,  i = Take 

1 m 


2 


q_  . = 1 + Q(j)  [ Z l/Q(j+2nJl)]  and  let  a , . = ^ ( 

Tt.j  n.A.j  ^n  ^ 


2m 


,2n 


-) . I j 1 1 n 


Q(j)  - 


and  a , » = j-  a . . for  H £ {j  ♦ 2n,  j + 4n,  ...},  |j|  < n.  Then  the  LSS 
n , A > x»  xv^./ 


to  h is  given  by  the  (discrete)  convolution 


S (t)  = I h(f-)S  X ^ ) 
£=-n+l 


Jin 


where  S^  j^(t)  = a^  e 


ikt 


Letting  a^  = lim  n a 


, 2m 


n-»<» 


n , X , H 


and  S.  (t)  = 4 y a,  . we  can  closely  approximate  n S , (t)  by 

2(Q(Jl)  * X^™)  ^ ^ 


iJlt 


22 


n The  CSS  to  h is  given  by  the  convolution 

h d S^(t)  = h(>)  S^(t  - y)dy  . 


If  it  is  known  that  the  function  f to  be  estimated  from  2.5.1  has  derivatives 
of  order  m,  hut  no  specific  operator  L is  known,  a natural  choice  is  L = d"'; 
in  which  case  S^{u)  becomes 


, 2m 


ilu 


1 r” 


,2m 


Writing  t, (u)  = ^ f 


211  ^ 2m 


2m 


X • ■ + y 

shown  h ® S^(u)  = h 9 tj^(u)  = / ^ h(y)t^(u  - y)dy 


e ^ dy  and  t;^(u)  = tj^(u+2kll),  it  is 


Remarks 

1)  The  asymptotic  results  of  Cogburn  and  Davis  are  doubly  valuable 
because:  (a)  they  are  computationally  easy,  and  (b)  they  facilitate  further 
theoretical  investigation  by  giving  the  LSS  and  CSS  in  closed  form. 

2)  The  requirement  that  the  data  should  be  periodic  and  lattice  is 
tailor-made  for  the  usual  spectral  estimation  from  periodogram.  Unfortunately, 
any  6-function  spikes  in  the  spectrum  which  cannot  be  well  approximated  by 
functions  in  W^(G)  will  not  be  well  resolved.  The  next  section  consider^  this 
question. 

3)  No  work  seems  to  have  been  done  on  ways  to  convert  non- lattice,  non 
periodic  data  fitting  problems  to  the  easy  periodic  lattice  form  of  this  section. 
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i; 

§2.6  Estimating  Spectral  Densities 

Determining  spectral  densities  gives  spline  theory  not  only  a great 
opportunity  but  also  a severe  test.  When  the  spectrum  is  absolutely  continuous, 
spline  methods  are  extremely  effective.  However,  when  the  spectrum  has  a 
discrete  component,  i.e.  6-function  spikes,  spline  methods  based  on  L2  have 
little  chance  of  sharply  resolving  the  spike  without  a great  deal  of  data  in 
the  vicinity  of  the  spike.  The  heart  of  the  difficulty  is  that  every  sequence 
of  functions  approximating  a 6-function  must  be  unbounded  in  L2  norm  and  so 
also  in  norm,  and  thus  the  smoothing  spline  is  duty  bound  to  flatten  these 
real  spikes  out.  However,  before  abandoning  the  L2  based  spaces,  we  need  to 
see  the  problem  in  perspective.  Because  of  the  superficial  similarity  between 
a 6-spike  and  a noisy  observation,  any  attempt  to  transfer  the  problem  to  a 
space  where  6-function  approximants  are  bounded  seems  doomed  to  failure  because 
we  would  be  unable  to  filter  out  the  noise  in  such  a space. 

Thus  reconciled  to  remaining  in  with  its  pleasant  inner  product  and 
Fourier  Transform  structure  we  may  yet  be  able  to  find  a way  out.  One  strategy 
may  be  to  proceed  as  follows.  Since  further  data  points  are  easily  obtained 
from  the  periodogram,  it  might  be  feasible  to  use  repeated  applications  of  the 
CVMSE  method  coupled  with  a procedure  to  introduce  extra  data  points  in  regions 
where  the  rate  of  change  (of  fitted  function)  is  large. 

First  let  us  examine  the  current  state  of  the  art  for  estimating  spectral 
densities  with  spline  methods. 

Estimatinci  the  Spectral  Density  of  a Stationary  Stoahastia  Prcaess 
Let  Xj,X2,Xj,...  be  a second  order  stationary  stochastic  process  with 
EX^  = 0,  EX.X.^,  = p,. 


i 
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The  Pj^  are  Fourier  coefficients  of  a symmetric  distribution  function  F on 
[-11,11],  ^ -^0  ^ F(w).  When  F is  absolutely  continuous,  it  is 

completely  determined  by  the  spectral  density 


f(w)  = {DF)(w) 

f(w)  = I . 

..00 


The  statistical  problem  is  to  estimate  f(w)  on  the  basis  of  observations 
X^,X2,. . Let  f(w)  denote  the  periodogram 


2.6.1 


n-1  n-1 

f(w]  = I P,  e = Pq  I P^  Cos  kw  where 
-n+1  k=l 

■ K "i'  “ ■ » 


When  the  process  is  Gaussian  it  is  shown  in  Walker  (1965)  that 


2.6.2  f(w)  •=  f{w)  U^(w)  + 

where  n ->■  0 in  probability  as  n -<■  “ and  U (jll/n)  are  uncorrelated  exponential 
n fc 

random  variables  with  a mean  and  variance  of  1 for  j = 0,...,+  n-1. 

Since  the  periodogram  is  an  inconsistent  estimator  of  f,  some  modification  is 
required.  Smoothed  (consistent)  estimators  of  f{w)  are  obtained  either  by 
smoothing  the  periodogram 


f*(w)  * f(X)  K(w  - X)  dX 


or  by  weighting  the  covariances  by  a "lag  window"  kj^(r)  giving 
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f*  (w) 


1 M 

= — y 

2n  ^ w 
r=-M 


kM(r)e-'^  e. 


where 


K(w)  = ^ I e-^^  k^(r)  . 

^ -M 

Cogburn  and  Davis  (1974)  consider  estimating  f by  a CSS  or  LSS  to  f; 

ie.  f 8 S , or  f 0 t,  . They  take  L = D and  obtain  an  estimate  of  the 
n.A  A 

IT  X 2 

integrated  mean  square  error  for  n large.  It  is  /_j^  MSE(?(t))dt  = ^ 

2 

f^dt  + -k-  f^-n  (f^^"*^)  dt.  The  value  of  X minimizing  the  RHS  is 
-IT  ^4m  -H 

J 

X = ( / (f(2">))2dt/  / f^dt)^""^ 

° a 2 

m 

and  the  resulting  MSE  is  O(^). 

Wahba  and  Wold  (1975b)  are  concerned  with  estimating  log  f(w)  for  spectral 
densities  f which  are  bounded  below.  Taking  the  logarithm  of  (2.6.2)  converts 
the  curve  fitting  problem  to  that  of  §2.5. 

We  have 


Y(j)  = log  f (^)  + Y + e^ , j = ;»1,^2, . . . ,+n-l 
7 n2 

with  Ee.  = 0.  Ee.  = ^ , Y = 0.5772 Euler's  constant,  and  a reasonably 

j ' j 6 

symmetric  distribution  of  e^  about  0 with  constant  variance.  In  their  paper 
(1975b)  Wahba  and  Wold  use  various  results  from  Cogburn  and  Davis  (1974)  en 
route  to  their  objective  which  is  to  show  (in  principle)  that  the  smoothing 
parameter  chosen  by  CVMSE  converges  to  the  parameter  minimizing  the  mean  squared 


error. 
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Wegman  (1977)  records  the  various  advantages  and  disadvantages  of  using 
log  f(w)  rather  than  f(w)  for  the  spline  fitting  model.  Although  the  fit  to 
log  f(w)  is  improved,  the  kernel  interpretation  and  the  attendant  results  on 
consistency  are  lost.  For  multiple  time  series  (with  which  Wegman  was 
concerned)  various  functions  such  as  transfer  function,  multiple  coherence  and 

/S  A* 

coherency  fit  the  spline  model  directly  via  log  ('*')»  ®tc.  and 

each  of  the  above  functions  may  be  estimated  directly  with  a spline,  rather 
than  as  products  and  quotients  of  spline  estimates. 
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CHAPTER  3 
Isotonic  Splines 


§3. 1 Preamble 

There  are  many  model  fitting  problems  where  we  either  have  some  prior 
knowledge  of  the  form  the  solution  must  take,  or  else  have  some  insight  into 
the  laws  governing  the  system.  This  knowledge  may  be  equivalent  to  the  require- 
ment that  the  fitted  function  preserve  some  order  on  the  data  points.  An 
obvious  example  is  a fitted  distribution  function  - this  must  satisfy 
F(x)  P(y)  whenever  x ^ y.  Functions  which  preserve  (in  some  sense)  an  order 
relation  of  their  arguments  are  called  isotone.  Knowledge  of  isotonicity  may 
follow  from  very  elementary  considerations.  For  technical  convenience,  the 
class  of  partial  orders  will  need  to  be  restricted  to  those  compatible  with 
the  (additive)  group  of  all  real  functions.  Our  framework  will  certainly  be 
general  enough  to  cater  for  the  monotone,  or  convex,  or  positive  functions  and 
other  families. 

The  difficulty  in  compelling  an  arbitrary  function  to  be  isotone,  is  that 
this  in  general  involves  an  infinite  number  of  constraints.  While  our  methods 
can  deal  with  a continuum  of  constraints  (if  they  are  weakly  compact)  they 
cannot  save  us  from  the  consequences  of  them.  As  our  results  show,  the  most 
general  isotone  spline  solving  the  GHB  problem  may  have  an  infinite  number  of 
knot  points.  This  situation  is  unsatisfactory  to  numerical  analysts,  who  need 
concrete  solutions.  However,  we  also  propose  and  solve  a slightly  less  general 
isotone  spline  problem,  which  has  only  a finite  number  of  boundedness  and 
uniformity  condition  (of  a form  that  a user  can  live  with) , and  the  fitted 
spline  has  only  a finite  number  of  knots. 
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The  main  justification  for  smoothing  spline  theory  is  that  it  provides  a 
non-par ametric  framework  for  extracting  a function  from  some  noisy  observations. 
It  is  expected  that  isotonic  splines  will  facilitate  an  even  more  drastic 
filtering  of  the  noise.  Regretfully,  we  offer  no  evidence  on  this  here. 

§3.2  Partially  Ordered  Groups  of  Functions 

We  shall  be  dealing  only  with  abelian  groups  of  real  functions  which  can 
be  added  pointwise. 

The  relation  > on  the  group  G of  functions  is  required  to  satisfy  the 
following  conditions 

i)  f > f 

ii)  f ^ g>  g ^ h implies  f > h 

iii)  f ^ g implies  f + h > g + h for  all  h e G 

iv)  f ^ 0.  g ^ 0 implies  f + g > 0 

v)  f ^ g.  g ^ f implies  f = g. 

A group  satisfying  (i)  to  (v)  is  called  a partially  ordered  group.  Sometimes 
it  is  convenient  to  drop  the  insistence  on  (v)  in  which  case  we  have  a 
pre- ordered  group. 

The  set  P = {g  e G:  g > 0}  is  called  the  positive  cone  of  the  order  >. 

The  set  P defines  and  is  defined  by  the  (partial)  order  >.  We  also  say  that  P 

is  just  the  set  of  functions  which  are  isotone  with  respect  to  the  order  >. 

§3.3  Realization  of  Partial  Orders  on  W^, 

We  restrict  our  attention  to  W^  although  the  results  and  methods  can  apply 
to  more  general  spaces. 

Let  F be  a continuous  linear  map  of  into  which  commutes  with 

differentiation  operators.  That  is  to  say 
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D(Ff)  » F(Df)  for  all  f £ c 


It  is  known  that  such  an  F can  be  represented  in  the  form  Ff  = S*f  where  S is 
a Schwartz  distribution. 

We  can  now  define  a partial  order  > on  W^  by  f ^ 0 if  and  only  if 
(Ff)(t)  > 0 for  Vt  € [0,1] . 


Examples:  (i)  If  F = Identity  the  set  of  functions  > 0 are  just  the  positive 
functions  in  W . (ii)  If  F = D,  the  set  of  functions  > 0 is  just  the  set  of 


m 


monotone  increasing  functions  in  W . (iii)  If  F = D the  set  of  functions 


> 0 is  just  the  set  of  convex  functions  in  W^ 


§3.4  Isotonic  Smoothing  Splines 

The  isotonic  splines  dealt  with  in  this  chapter  solve  the  problem: 


3.4.1 


Minimize  /J  (f^™^)  dt  subject  to 


few,  a.  < f(tj  < e. , Ff(t)  >0  Vt  e [0,1], 
m 1 — 1 1 


Our  results  show  that  a solution  exists  and  is  a spline  of  degree  2m-l.  The 
Hilbert  space  methods  used  in  this  problem  cannot  be  used  directly  with  the 
other  (commoner)  smoothing  spline  problem: 


3.4.2 


Minimize  (X  (f^™^)  dt  + I(yj  - f(t^)) 


subject  to  ^ ^ i 0 Vt  . 
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The  objective  function  in  3.4.2  is  not  a Hilbert  space  norm  squared,  and  hence 
the  difficulty.  However,  3.4.2  is  the  minimization  of  a convex  function  with 
linear  constraints  so  it  can  in  principle  be  handled  by  non-linear  programming. 
The  existence  of  a solution  of  3.4.2  is  never  in  doubt  - only  its  character. 

§3.5  A Restricted  Isotonic  Spline 

First  we  solve  the  restricted  problem,  then  move  on  to  the  most  general 

one. 

Conditions:  Let  F be  a continuous  linear  map  ^ -*■  which  commutes  with 

differentiation.  Denote  the  partial  order  defined  by  F on  by  > . Assume 

there  exists  a function  g ^ satisfying  < g(t^)  < 3^  for  i = l,2,,...,n 

with  (Fg)(t)  > 0 for  Vt  e [0,1].  The  continuity  of  DF  implies  there  exists 

d > 0 such  that  I |DFf|  I,  < dl  iD^fl  1,  , Vf  c W^.  Take  ^ ^'*4 

2 2 - 2 

€ 

Theorem  1:  Under  the  conditions  just  given  the  problem: 

Minimize  (f^'”'^)^  dt  subject  to 

few,  f >0,  a.  < f(t.)<  3.  i = 1,2,. ..,n 

m — 1 — 1 — 1 

Ff(j/N)  > e j = 0,1,. ..,N 


has  a solution  which  is  a polynomial  spline  of  degree  2m- 1 with  possible  knots 
at  t^;  i » l,...,n  and  j/N;  j = 0,1,..., N.  Before  we  can  prove  Theorem  1 we 
need  to  establish  some  preliminary  results,  and  two  major  theorems. 


Lemma  1: 


and 


Ff{j/N)  ^ c for  j=0,l,...,N  imply  f > 0. 

The  straightforward  proof  is  omitted. 

Let  T be  a continuous  linear  map  05^0  case  T = D ).  Suppose 

Ker  T is  of  finite  dimension.  Let 

Al  = {u.  £ W^;  i = l,2,...,n  :<u^,f>  = - f(t)  Vf  c W^} 
and 

^2  = {v.  e W^;  i = l,...,n  ; <v.,f>  ^ f(t^),  Vf  e W^} 

and 

A = {w.  £ W ; i = 0.1 N:  <w..f>  = -Ff(j/N).  Vf  £ W } 

3 1 m 1 

Take  L = A^  u A^  u A^  and  define  a map  p:  L -<■  R by 

p{Uj)  = -a^  u.  £ Aj 

p(Vi)  = V.  £ A^ 

P(w^)  = -e  £ Aj  . 

Now  define  a convex  subset  C of  W by 

m 

C = {f  £ W : VJl  € L,  <il,f>  < p(Jl)}. 
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We  are  seeking  an  element  a e C satisfying 


lTo||  = Min  ||Tf|l. 
feC 


Such  an  element  a will  be  called  the  (generalized)  spline  function  relative  to 

T in  the  set  C.  Let  C = {f  e W : V£  e L <£,f>  < O}.  We  need  the  following 

m — 

assumption 


HI  C n (Ker  T)  = (O). 


Theorem  2:  (M.  Atteia  and  P.  L,  Joly) . Under  assumption  HI  there  exists  at 

least  one  spline  function  relative  to  T in  the  convex  set  C. 


Proof;  Let  = T(C)  and  x = To.  We  must  have  (*)  11t|1  = Min  ||y|l.  The 

yeil 

subset  n is  convex.  If  moreover  is  closed  there  will  exist  a unique  element 
T € £1  satisfying  (*)(it  is  the  projection  of  the  origin  onto  £1).  Thus  a e C 
exists  such  that  To  = t. 


Proof  that  £1  is  closed:  £1  = T(C)  is  closed  iff  (Ker  T)  + C = T (£1)  is  closed 

in  W . Clearly  the  restriction  T of  T to  (Ker  T)  is  a homeomorphism  and 
m 


£1  = T ((Ker  T + C)  n (Ker  T)  •') , 


Dieudonne's  Theorem:  A,B  closed  non-empty,  convex  subsets  of  a T.V.S.  with  A 
locally  compact.  Let  A = X (A  - a)  a € A.  Then  = 0 implies  A - B 


closed. 
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In  the  present  case  Ker  T is  locally  compact  (because  it  is  of  finite  dimension) 
and  we  can  easily  prove  that  (Ker  T)^  = Ker  T and  = C.  Thus  assumption  111 
implies  C + Ker  T is  closed. 

In  order  to  characterize  the  spline  relative  to  T in  C we  need  some  more 

,1 

structure  axioms. 

H2  The  subset  L c is  weakly  compact  and  the  map  p 

is  a continuous  map  of  L (with  weak  topology)  into  R. 

H3  C n Ker  T is  empty 

H4  I = {f  £ W^:  'iZ  € L,  <11, f>  < p(ll)}  is  not  empty. 

We  can  now  put  everything  together  with  the  following  result. 

Theorem  3:  (P.  J.  Laurent).  Under  assumptions  HI,  H2,  H3,  H4  the  element 

C is  a spline  function  (relative  to  T in  C)  if  and  only  if 

-T*Ta  € ^(F^)  where  = {£  £ L;  <«,,a>  = p(*.)} 

and  ^(Fg)  denotes  the  smallest  closed  convex  cone  with  vertex  0 containing  F^; 
and  T*  is  the  adjoint  of  T. 

Proof;  Consider  the  function  f(y)  = fj(y)  + ^2^^^  with  f ^ (y)  = l|Ty||  and 
= 0 if  y £ C,  <»>  if  y C.  We  wish  to  characterize  a such  that 

f(a)  = min  f(y). 

y£W 
^ m 


I 
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The  element  o satisfies  this  iff 

0 e 3f(a) 


where  3f(a)  denotes  the  set  of  subgradients  of  f at  o.  As  f^  is  convex  and 
continuous  and  convex  and  lower  semicontinuous  we  have 

3f(o)  = 3f^(o)  + 3f2(o). 

From  H3  we  have  \ iToj  | 0 and  3fj(a)  is  reduced  to  one  element 

3f^(0)  = . Using  H2,  H3,  H4  it  is  shown  that 

3f2(a)  = (F^)  . 

Finally  the  condition  (♦)  is  satisfied  iff  there  exists  £ 3fj(a)  and 
JI2  e + *>2  = ° i.e.  if 

-T*Ta  e CC  (F^)  . □ 

Proof  of  Theorem  1;  At  the  optimum  o,  | |D"'a|  £ I l^^gl  and  so  by  Lemma  1, 

the  spline  function  satisfying  all  the  constraints  of  L must  be 

We  can  now  translate  the  assumptions  HI -2- 3-4  into  concrete  form  and 
confirm  that  they  hold. 

HI  Since  T is  D™,  Ker  T is  the  set  of  polynomials  of  degree  m-1.  Every  func- 
tion in  C is  zero  at  all  the  points  tj^,  i = l,...,n.  Hence  C n Ker  T - {O} 
(i.e.  HI)  holds  if  n,  the  number  of  data  points  exceeds  m-1. 
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H2  L is  finite,  therefore  compact  in  any  topology,  p;  L -►  R is  automatically 
continuous. 

H3  C n Ker  T is  empty.  This  is  safe  to  assume  since  any  function  in  this  set 
would  automatically  be  the  optimum  solution  of  Theorem  1 without  Theorems 
2 and  3. 

H4  I = {f  e W : V2.  e L,  <(<!•, f)  < pC*-)}  contains  the  function  g of  the  condi- 
m 

tions  to  Theorem  1 and  so  I is  not  empty. 

Theorem  3 shows  that  the  minimizing  element  a which  was  shown  to  exist  in 
Theorem  2 satisfies  -T*Ta  e closed  convex  cone  generated  by  all  £ e L 
corresponding  to  active  constraints.  In  other  words  the  functional 

T*Ta  = -I  d.  £ 

£eL 

with  all  d-  > 0 and  d»  = 0 when  £ is  not  an  active  constraint,  and  all  the 

iL  ““ 

£ e L are  point  evaluation  (delta)  functions.  Thus  T*To  is  zero  at  all  points 
except  those  corresponding  to  active  constraints  (which  become  knots) . Between 
knot  points  T*To  = 0 i.e.  = 0 so  that  a is  a polynomial  of  degree  2m-l 
between  knot  points.  Thus  Theorem  1 is  proved.  D 

§3.6  The  General  Isotonic  Spline 

We  have  already  mentioned  the  general  isotonic  spline,  with  possibly  an 
infinite  number  of  knots,  able  to  occur  anywhere.  The  following  result  makes 
all  the  details  precise. 
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Theorem  4:  Let  > be  the  partial  order  defined  by  ^2‘  there 


exists  g ^ satisfying 


Fg(t)  >0  t e [0.1] 


“i  ^ ®i  i = 1,2,. . . ,n 


Then  the  problem 


Minimize  /q  CD™f)^dt  subject  to 
_<  f(t^)  £ 6j|^;  i = l,...,n  and 
f :-0,  f € 


has  a solution  which  is  a polynomial  spline  of  degree  2m-l  with  knots  at  the 
data  points,  and  in  exceptional  cases  a countable  number  elsewhere. 


Proof : This  follows  the  lines  of  Theorem  1 when  we  take  T = D , 

A,  = {w  e W , t e [0,1];  <w.,f>  = Ff(t),  Vf  e W } and  p(w)  = 0 for  all  w e A . 
3 t m t m o 

The  key  requirement  that  A^  be  weakly  compact  holds  in  this  case,  and  p is 
continuous.  The  rest  is  routine.  0 


§3.7  Applications  of  Theorems  1 and  4 

We  now  present  two  interesting  applications  of  Theorems  1 and  4,  which 
extend  the  partial  results  of  Passow  (1974),  Passow  and  Roulier  (1974). 


Proposition  1:  If  y^  < y2  < ...  < yj^  with  arbitrarily  small  error  bounds,  there 

exists  a globally  monotone  (cubic)  smoothing  spline  of  the  form  of  Theorems  1 
1 2 

and  4,  minimizing  (f**) 
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Proof:  We  only  need  show  there  exists  a suitable  g satisfying  the  assumptions 
of  Theorem  1.  Let 


(|)(t)  = exp  (l/(t^-l))  |t|  < 1 
=0  |t|  > 1 

4»(t)  is  known  to  be  infinitely  differentiable  on  the  whole  real  line  and  so  in 
all  W^,  m ^ 1.  Let  <|>^(t)  = (|>(n)dn,  (j)^  is  in  c”  and  it  is  clear  that 

♦jCt)  =0,  t < -1 

<Jij(t)  = k = /^^(j»(t)dt  when  t^  1 

and  of  course  D<t>,  Ct)  ^ 0 for  all  t.  When  the  conditions  of  this  proposition 
hold,  adding  together  suitably  scaled  translates  of  4),  will  give  a function 
f^(t)  e c“  with  DfjCt)  1 0,  Vt  which  satisfies  f^Ct^)  = y^,  i = l,2,...,n.  If 
£ is  the  error  bound  on  the  y^,  the  function 

g^(t)  = f^(t)  + et  t € [0,1] 

satisfies  Theorems  1 and  4 with  F = D,  m = 2.  D 

Proposition  2:  If  ^ 7^  for  i = 1.2,...,n-2  and  the  y. 

^ ^i+1  ■ ^i  i+2  ■ i+1 

have  arbitrarily  small  error  bounds  then  there  exists  a globally  convex 

1 2 

(quintic)  smoothing  spline  of  the  form  of  Theorem  1 and  4,  minimizing  (f  ^ • 
Proof:  From  the  function  (J)(t)  of  Proposition  1 we  take 


4>2Ct)  = <l>(u)du]dn. 
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When  the  conditions  of  this  proposition  hold,  adding  together  suitably  scaled 
translates  of  4*2 (t)  will  give  a function  f2  e C with  D^f(t)  ^0  all  t which 
satisfies 


= Xi  i = 1,2,. ...n. 


If  e is  the  maximum  error  bound  on  the  the  function 

2 

g2(t)  = f2(t)  * t e [0,1] 


2 

meets  the  assumption  of  Theorem  1 and  4 with  F = D , m = 3. 

Passow  shows  the  existence  of  an  interpolating  spline,  isotonic  and 
occasionally  antitonic  between  knot  points  which  are  data  points  augmented  by 
certain  others.  The  results  of  Propositions  1 and  2 establish  necessary  and 
sufficient  conditions  for  global  isotonicity  and  can  also  be  applied  to 
Passow' s problem  to  obtain  locally  isotonic  or  antitonic  splines  without  the 
artifice  of  extra  knots. 
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Conaludinq  Remarks 

1)  In  connection  with  Theorem  4 it  would  be  very  useful  to  have  necessary  and 
sufficient  conditions  for  a prescribed  minimum  spacing  between  knot  points  or  at 
least  to  guarantee  only  a finite  number  of  knot  points. 

2)  Efficient  algorithms  for  fitting  isotonic  splines  would  be  valuable  for 
converting  this  theory  into  practice.  An  iteration  technique  based  on  adding 
further  linear  constraints  at  key  points  could  easily  be  feasible.  The  inter- 
polating spline  might  be  computationally  more  difficult  than  the  smoothing 
spline  when  an  iterative  method  is  used. 
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